Column

Overview

Visualizing in higher dimension space can be messy and unintuitive (Hilbert space, \(\mathbb{R}^p,~~p>3\), where p are numeric variables). Analysis of higher dimensions must be interpretable in terms of the original dimensions and ideally utilizes all of the information in the data.

To these ends we advise the use of projection pursuit as acheived in the R package tourr(2011, H Wickham & D Cook). Furthere, we impliment a method for manual controls following D. Cook, & A. Buja (1997) in an R package spinifex, currently available with devtools::install_github("nspyrison/spinifex"). We also compare and contrast alternative methodolgy; namely Principal Component Analysis (PCA, 1901 K. Person), T-distributed Stochastic Neighbor Embedding (t-SNE, 2008 L van derMaaten & G Hinton), and holes ompimized tour (an application of projection pursuit, 1974 J Friedman & J Tukey). Grand Tour purposed D Asimov (1985).

The R package, tourr (2011, H Wickham & D Cook), gives a means to animate 2-d projections of rotated p-dimensional data object. The path of rotation may take the form of a random walk, predefined path, or optimizing an index by (“semi-”stochastic) gradient descent (Projection Pursuit, described above).

\(Work~in~progress,~~TODO:~add~to,~cleanup\)

References

H. Wickham, D. Cook, H. Hofmann, and A. Buja (2011). tourr: An r package for exploring multivariate data with projections. Journal of Statistical Software 40(2), http://www.jstatsoft.org/v40.

D. Asimov (1985). The grand tour: a tool for viewing multidimensional data. SIAM Journal on Scientific and Statistical Computing, 6(1), 128–143.

D. Cook, & A. Buja (1997). Manual Controls for High-Dimensional Data Projections. Journal of Computational and Graphical Statistics, 6(4), 464–480. https://doi.org/10.2307/1390747

H. Wickham, D. Cook, and H. Hofmann (2015). Visualising statistical models: Removing the blindfold (withdiscussion). Statistical Analysis and Data Mining 8(4), 203–225.

Thanks

Prof. Dianne Cook - Guidance, inspiration, and contributions to projection pursuit

Dr. Ursula Laa - Collaboration, use cases, and development feedback

Other reading

Principal Component Analysis —— t-distributed Stochastic Neighbor Embedding
Projection pursuit —— Grand Tour —— Spinifix Hopping Mouse

Column

Tourr

Spinifex

\(TODO:~scale~output~of~spinifex::proj_data(),~case~handling~for~spinifex::slideshow(),~apply~Phys~data.\)

PCA, t-SNE, PP

Method Interoperable Lossy Global Overfitable NonLinearData
PCA TRUE TRUE TRUE FALSE FALSE
t-SNE FALSE NA FALSE TRUE TRUE
Tour, holes TRUE FALSE TRUE FALSE FALSE

_

_

_

_

_

_

_

_

_

_

_

_

_

_

_

_

_

_

f.pca <- stats::prcomp(flea)
ggplot2::ggplot(f.pca) + ...

f.tsne <- Rtsne(f, ...)
f.tsne.pca <- stats::prcomp(f.tsne)
ggplot2::ggplot(f.tsne.pca) + ...

f.holes_end <- tourr::animate_xy(flea, guided_tour(index = holes))
ggplot2::ggplot(f.holes_end) + ...

Data set - flea Consists of 74 observations of 6 length measurements taken across 3 different species of flea-beetles. Within the graphics species is used to select color and point character, but the methods are discused are all unsupervized (they can’t use species). Data from A Lubischew (1962), Analogous to R Fisher’s Iris data [100x5] (1936). The flea dataset is available in the tourr and spinifex R packages.

PCA Lossiness